In this paper, we present a cross-modal recipe retrieval framework, Transformer-based Network for Large Batch Training (TNLBT), which is inspired by ACME~(Adversarial Cross-Modal Embedding) and H-T~(Hierarchical Transformer). TNLBT aims to accomplish retrieval tasks while generating images from recipe embeddings. We apply the Hierarchical Transformer-based recipe text encoder, the Vision Transformer~(ViT)-based recipe image encoder, and an adversarial network architecture to enable better cross-modal embedding learning for recipe texts and images. In addition, we use self-supervised learning to exploit the rich information in the recipe texts having no corresponding images. Since contrastive learning could benefit from a larger batch size according to the recent literature on self-supervised learning, we adopt a large batch size during training and have validated its effectiveness. In the experiments, the proposed framework significantly outperformed the current state-of-the-art frameworks in both cross-modal recipe retrieval and image generation tasks on the benchmark Recipe1M. This is the first work which confirmed the effectiveness of large batch training on cross-modal recipe embeddings.
translated by 谷歌翻译
人对象交互(HOI)检测作为对象检测任务的下游需要本地化人和对象,并从图像中提取人类和对象之间的语义关系。最近,由于其高效率,一步方法已成为这项任务的新趋势。然而,这些方法侧重于检测可能的交互点或过滤人对象对,忽略空间尺度处的不同物体的位置和大小的可变性。为了解决这个问题,我们提出了一种基于变压器的方法,Qahoi(用于人对象交互检测的查询锚点),它利用了多尺度架构来提取来自不同空间尺度的特征,并使用基于查询的锚来预测全部Hoi实例的元素。我们进一步调查了强大的骨干,显着提高了QAHOI的准确性,QAHOI与基于变压器的骨干优于最近的最近最先进的方法,通过HICO-DEC基准。源代码以$ \ href {https://github.com/cjw2021/qhoii} {\ text {this https url}} $。
translated by 谷歌翻译
Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.
translated by 谷歌翻译
大量培训数据是最先进的NLP模型高性能的主要原因之一。但是,在培训数据中,什么导致模型做出一定的预测?我们试图通过提供一种通过因果框架来描述培训数据如何影响预测的语言来回答这个问题。重要的是,我们的框架绕过了重新培训昂贵模型的需求,并使我们能够仅基于观察数据来估计因果效应。解决从验证的语言模型(PLM)中提取事实知识的问题,我们重点介绍了简单的数据统计数据,例如共发生计数,并表明这些统计数据确实会影响PLM的预测,这表明此类模型依赖于浅启发式方法。我们的因果框架和结果表明,研究数据集的重要性以及因果关系对理解NLP模型的好处。
translated by 谷歌翻译
电子商务网站属性价值提取(AVE)的主要挑战是如何处理多种产品的大量属性。尽管该挑战是通过一个问题回答(QA)方法来解决的,该方法在给定查询(属性)的产品数据中找到值,但对于稀有和模棱两可的查询,它不能有效地工作。因此,我们根据基于质量质量质量检查的AVE的查询(属性)的可能答案(属性)提出了简单的知识驱动查询扩展。我们从培训数据中检索查询(属性)的值以扩展查询。我们用两个技巧来训练一个模型,即知识辍学和知识令牌混合,这模仿了测试中价值知识的不完善。我们清洁版的Aliexpress数据集的实验结果表明,我们的方法改善了AVE的性能(+6.08宏F1),尤其是对于稀有和模棱两可的属性(分别为+7.82和+6.86宏F1)。
translated by 谷歌翻译